# PAPER Special Section on VLSI Design and CAD Algorithms A New Single-Clock Flip-Flop for Half-Swing Clocking

## Young-Su KWON<sup> $\dagger$ </sup>, In-Cheol PARK<sup> $\dagger$ </sup>, and Chong-Min KYUNG<sup> $\dagger$ </sup>, Nonmembers

**SUMMARY** A new flip-flop configuration for half-swing clocking is proposed to save total clocking power. In the proposed scheme, only NMOS's are clocked with the half-swing clock in order to make it operate without level converters or any additional logics which were used in the earlier half-swing clocking schemes.  $V_{cc}$  is supplied to the random logic circuits and flip-flops while  $V_{cc}/2$  is supplied to the clock network and some parts of the flip-flop to reduce the power consumed in the clock network. Compared to the clocking power by 40%. **key words:** low-power circuit, clocking power, half-swing clocking

#### 1. Introduction

Low-Power consumption is the most critical factor in the current VLSI design, especially in the design of portable devices. There have been numerous approaches to reduce the power consumed in component blocks such as memories, control units and datapath units. However, reducing power consumption for the blocks does not have a great effect on the total chip power because they consume relatively small amount of power compared to the clocking power.

Figure 1 shows the distribution of power consumption in several chips. Although the power distribution of a chip may differ from others, the power consumed in the clocking is very crucial and takes almost 20-30% of the total chip power. As an example, the clocking system consumes up to 50% of total power in the case of DEC Alpha processor whose clock load is 3.75 nF [1]. The reason for this large power consumption of the clocking system is twofold. First, the switching activity of the clock is very high, which is 100%, while that of the other parts is under 60% on the average. Second, the clock drives a large number of flip-flops. Most power of the clock distribution network associated with the flip-flops.

The double-edge triggered flip-flop [4] can be used to reduce the switching activity of the clock network. Since the flip-flop accepts data at both edges of the clock, switching activity of the clock can be reduced by half for the same data rate. Therefore, the power consumed in the clock network becomes half of the original power consumption. Another approach is based on the half-swing clocking schemes. One scheme was presented in [8], where the clock circuitry swings half of  $V_{cc}$  and full  $V_{cc}$  is supplied to the other logic. It requires four clock signals to be generated from the clock driver circuit: Two upper swing clocks are fed to PMOS's and the other two lower swing clocks are fed to NMOS's. These four clock signals not only increase clock interconnection capacitance but also make clock line routing disadvantageous in terms of area and skew adjustment. Moreover, this scheme needs a special clock driver circuit that requires large capacitors to be made on a chip or outside of the chip. Another half-swing clocking scheme is to change the well voltage of PMOS's [9]. In this flip-flop configuration, PMOS's are driven by the half-swing clock and their wells are connected to a voltage higher than  $V_{cc}$ .

In this paper, we propose a new flip-flop configuration that can operate with the half-swing clock. The proposed flip-flop accepts the half-swing clock directly, thus doesn't need level converters required to convert the half-swing clock to the full-swing clock in the earlier schemes. Our approach is grounded on the idea that the flip-flop can operate with the half-swing clock if no PMOS's are driven by the clock. Such a flip-flop can lower the clock network switching power to a quarter compared with a flip-flop with the full-swing clock. Besides the area of the flip-flop is not quite different from



 $\label{eq:Fig.1} {\bf Fig.1} \quad {\rm The \ distribution \ of \ the \ power \ consumption \ in \ several \ chips.}$ 

Manuscript received March 15, 1999.

Manuscript revised June 11, 1999.

 $<sup>^\</sup>dagger {\rm The}$  authors are with the Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, 373–1, Kusong-dong, Yusong-gu, Taejon, 305–701, Korea.

2522

the conventional one, the speed degradation is small because, except the circuits related to the half-swing clocking, the voltage swings of the other circuits are the same as conventional flip-flops.

This paper is organized as follows. In Sect. 2, the proposed flip-flop configuration is presented in detail. In Sect. 3, the characteristics of the proposed flipflop and the comparisons to the other schemes are described. Experimental results are shown in Sect. 4.

#### 2. Proposed Half-Swing Clocking Flip-Flop

Figure 2 compares the configuration of the conventional flip-flop and the proposed one. The conventional flip-flop shown in Fig. 2 (a) has 22 transistors and does not require dual rail inputs, whereas the proposed configuration shown in Fig. 2 (b) is cascading two n-RAM type latches to which a half-swing clock is safely applied because only NMOS's are clocked. Reducing the voltage swing of the clock to the half of  $V_{cc}$  is effective in saving a great amount of power consumed in the clock network and clock driver buffers.

The n-RAM type latch used in the master and slave is similar to the CVSL latch which is a dualrail dynamic circuit [5], [6]. One variation of the CVSL latch, called DSTC, is almost the same as the CVSL latch except that one transistor is clocked to reduce power consumption [7]. The static version of the DSTC latch shown in [7] uses PMOS's in the slave and is very sensitive to input glitches when in the hold state. The proposed flip-flop based on the n-RAM type static latch can operate with a half-swing clock and does not need PMOS's in the clocking. Although its operation depends on the transistor ratio, it is robust to the input



(b) The proposed flip-flop Fig. 2 Proposed flip-flop for half-swing clocking.

glitches.

The proposed flip-flop composed of two latches operates as follows. When the clock is high, "CLKM" is turned on because  $V_{cc}/2$  is greater than the NMOS threshold voltage. Therefore, a current path through "CLKM" is created and "xb" node is discharged when "Db" is high and vice versa. One of the "x" and "xb" nodes is discharged and the other node is charged. "Q" and "Qb" do not change because "clkb" is low and "CLKS" is off. The values of "Q" and "Qb" are maintained by a static latch pair. At the moment when the clock changes from high to low, "CLKM" becomes off and "Q" and "Qb" are determined by the value of "x" and "xb" at that moment. When the clock is low, that values of "Q" and "Qb" do not change because "CLKM" is off and "x" and "xb" cannot change. To turn on the "CLKM" or "CLKS,"  $V_{cc}/2$  must be greater than  $V_T$  and the width of the transistor "CLKS" must be enlarged to increase the discharge current of the slave. The master's supply voltage is  $V_{cc}/2$  because "x" and "xb" do not need to swing full  $V_{cc}$ . Therefore, the transistor size of "CLKM" is not large. The waveform of the positive edge triggered case is shown in Fig. 3, which is obtained by HSPICE simulation.

Note that there are small glitches on the "Q" output, which are resulted from the signal skew between "D" and "Db." Since "D" and "Db" can be high simultaneously for a short time, "x" and "xb" can be changed even when the clock is low as shown in Fig. 5(a). A current path created through the transistors, "D" and "Db", may change "x" and "xb" and then change "Q" and "Qb." To remove the glitches, we can use another configuration shown in Fig. 4. Two clocked transistors are intentionally inserted at the middle of the static latch pair and "D" and "Db" transistors to break the current path. However, the slave's configuration doesn't need to be changed because the slave guarantees no glitches if the master does not have glitches. The waveform of "x" of the glitch-free flip-flop is shown in Fig. 5 (b).



Fig. 3 Waveform of the proposed flip-flop operation.



Fig. 4 The proposed glitch-free flip-flop



(b) For the glitch-free flip-flop in Fig. 4

Fig. 5 The waveform of "x" when two configurations are used.

### 3. Characteristics of the Proposed Flip-Flop

We have measured the delay time of the proposed flipflop and the conventional flip-flop shown in Fig. 2 (a) as the delay of a flip-flop has a great effect on the critical path delay. The propagation delay of the proposed flipflop depends on the width of "D" and "Db" transistors and the "CLKS" transistor of the slave. The transistor width of "CLKS" ( $W_{CLKS}$ ) should be large enough to discharge current to change the state of the inverter pair. The width of "D" and "Db" transistors ( $W_D$ ) is



**Fig. 6** Propagation delay of the proposed flip-flop and the conventional flip-flop.  $(V_{tn} = 0.77 \text{ V}, V_{tp} = 0.56 \text{ V}, W_{clk} = W_d = 8 \,\mu\text{m})$ 

the same as that of the "CLKS" transistor. In  $0.6\,\mu\mathrm{m}$ technology where  $V_{tn} = 0.77 \text{ V}$  and  $V_{tp} = 0.56 \text{ V}, 8 \,\mu\text{m}$ is proper for the width of these transistors. Figure 6 shows the change of the delay time when load capacitance varies from 19 fF to 100 fF. The increment of the propagation delay is almost independent of load capacitance if we focus on cases that loading is of practical size. The delay increment is composed of two factors. The first is caused from the poor driving capability of the master latch, and the second is from the buffer that drives the load. The input capacitance of the buffer does not change largely according to the load, not greater than 45 fF in general. The input capacitance of the buffer which can drive up to 32 standard loads is 45 fF. When the load capacitance is 20 fF, the delay increment is about 0.9 ns as shown in Fig. 6. Since full  $V_{cc}$  is supplied to other circuits, the critical path delay can be increased by only the increase of the flipflop propagation delay, leading to 0.9 ns in the proposed scheme.

The setup time of the conventional flip-flop depends on the path delay of the master's inverter chain, while in the proposed flip-flop it depends on how fast the "CLKM" transistor and the "D" transistor change the state of the inverter pair. We measured the setup time of the conventional flip-flop and the proposed flip-flop. The widths of "CLKM" and "D" transistors are  $4 \,\mu\text{m}$ . The setup time was 0.4 ns for the conventional flip-flop and 0.7 ns for the proposed flip-flop. This slight increase is resulted from the time required to flip the nodes of the master's inverter pair.

Figure 7 shows layout examples of the conventional flip-flop and the proposed flip-flop. The proposed flip-flop has a well connected to the  $V_{cc}/2$  line as the master and the internal clock buffer require  $V_{cc}/2$ . Since there are more NMOS's than PMOS's, the areas of P-well and N-well are unbalanced and thus the N-well has a staircase shape. The area of (a) is  $57.6 \times 38.2 \,\mu\text{m}^2$  and that of (b) is  $47.6 \times 43.8 \,\mu\text{m}^2$ . The area can be reduced



(a) Conventional flip-flop



(b) Proposed flip-flop

Fig. 7 Layout of (a) a conventional flip-flop and (b) the proposed flip-flop.  $W_{clk} = 8 \,\mu\text{m}$  and the well for  $V_{cc}/2$  is specified in the figure.

by a factor of 5% compared to the conventional flipflop, but this reduction in the area may be canceled out due to the  $V_{cc}/2$  routing overhead.

### 4. Experimental Results and Comparison

Since the clock network capacitance or the number of flip-flops is dependent on the system under consideration, the clocking system is not easy to model. According to the distribution of power consumption in high performance chips such as Mpact Media processor reported in [11], the power consumed in clock interconnection and clock buffer cells is about twice of that of the flip-flops. When we simulated the conventional flip-flop with randomly generated D inputs, the average power was 119.8  $\mu$ W under 0.6  $\mu$ m technology, 100 MHz clock frequency and 3.3 V  $V_{cc}$ . Therefore, using the power distribution ratio reported in the Mpact Media Processor, we assumed that the average capacitance of clock network per one flip-flop is 440 fF. The



Fig. 8 Simulation environment.



Fig. 9 Measured power consumption of several schemes.

clock network capacitance is 4.4 pF for 10 flip-flops as shown in Fig. 8.

The power consumptions obtained from HSPICE simulation for various flip-flop configurations are shown in Fig. 9. The half-swing clock can be applied only to the proposed scheme. SSTC is the static version of the CVSL latch [7] and STSL is the flip-flop used for low power applications such as StrongARM [10]. The power consumed in the flip-flop is almost similar to the proposed scheme. Strictly speaking, the proposed flipflop consumes slightly more power because of the large width of "CLKS" required to draw the discharge current. However, the proposed scheme reduces the clock network power to a quarter of full-swing clock cases.

Three possible schemes for the half-swing clocking are listed in Fig. 10. In half-swing clocking using conventional flip-flops, level-converters are essential to generate a full-swing clock that is required to turn off PMOS. In Fig. 10 (a), the conventional flip-flop is used in which a level converter is used to make a full-swing clock. In Fig. 10 (b) [8], the clock driver makes four clock signals that are fed to the conventional flip-flop. The four clock signals make the clock skew problem worse. The proposed scheme shown in Fig. 10 (c) works with a half-swing clock. Since it does not need level converters, it is simpler than the others.

|                               | Conventional with    | Previous half-swing       | The proposed          |
|-------------------------------|----------------------|---------------------------|-----------------------|
|                               | half-swing clock     | clocking scheme           | scheme                |
|                               |                      | (including clock drivers) |                       |
| Clock network                 | $234.1 \mu W$        | $401.9\mu W$              | $216.8\mu W$          |
| & buffer cells                |                      |                           |                       |
| (Special clock driver in (b)) |                      |                           |                       |
| Flip-Flop                     | $298.3\mu W$         | $315.5\mu W$              | $351.2\mu W$          |
| Level converter               | $378.9\mu W$         |                           |                       |
| Total Power Consumption (%)   | $911.3\mu W~(100\%)$ | $717.4\mu W~(78.7\%)$     | $568.0\mu W~(62.3\%)$ |

Table 1 Power consumption of three half-swing clocking schemes shown in Fig. 10.



(a) A conventional flip-flop with half-swing clock



(b) Previous half-swing clocking scheme



(c) The proposed scheme Comparison of half-swing clocking schemes. Fig. 10

The three configurations shown in Fig. 10 were simulated under the same condition except that the clock frequency was 20 MHz. As the clock generator proposed in [8], which was used in Fig. 10 (b), did not swing full  $V_{cc}$  under 100 MHz, we have used a lower clock frequency. In (b), we assume that the average capacitance of the network per one flip-flop is 110 fF which is a quarter of that of the conventional flip-flop because there are four clock nets and then the number of clocked transistors per one clock net is smaller. The result is shown in Table 1. The power consumptions of the flip-flop are almost the same. Although we assume a smaller capacitance in (b), the power consumed in the clock driver and the clock network is larger than other schemes because the power consumed in the clock driver is large. Moreover, four clock signals must be generated and the clock skew problem becomes worse in (b).

#### 5. Conclusion

To reduce clocking power, we proposed a new flip-flop configuration that can accept a half-swing clock directly without attaching level converters and special clock drivers. In the proposed flip-flop, the half-swing clock is connected to only NMOS's to eliminate level converters required to drive PMOS's. In a  $0.6 \,\mu m$  technology, simulation results using HSPICE show that the proposed scheme saves about 40% of total clocking power compared to the conventional flip-flop working with the half-swing clock, while the previous half-swing clocking scheme saves about 21%. In addition, the layout area of the proposed flip-flop is 5% smaller than that of the conventional flip-flop.

#### References

- [1] B.J. Benschneider, et al., "A 300-MHz 64-b quad-issue CMOS RISC microprocessor," IEEE J. Solid-State Circuits, vol.30, no.11, pp.1203-1211, Nov. 1995.
- [2] H. Kojima, S. Tanaka, Y. Okada, T. Hikage, F. Nakazawa, H. Matsushige, H. Miyasaka, and S. Hanamura, "A multicycle operational signal processing core for an adaptive equalizer," VLSI Signal Process., VI, pp.150–158, Oct. 1993.
- [3] K. Yano, et al., "A 3.8 ns CMOS  $16 \times 16$  multiplier using complementary pass-transistor logic," IEEE J. Solid-State Circuits, vol.25, no.2, pp.388-395, April 1990.
- [4] R. Hossain, L.D. Wronski, and A. Albicki, "Low power design using double edge triggered flip-flops," IEEE Trans. VLSI Systems, vol.2, no.2, pp.261-265, 1994.
- [5] K.M. Chu and D.L. Pulfrey, "A comparison of CMOS circuit techniques: Differential cascode voltage switch logic versus conventional logic," IEEE J. Solid-State Circuits, vol.22, no.4, pp.528-532, 1987.
- [6] D.R. Renshaw and C.H. Lau, "Race-free clocking of CMOS pipelines using a simple global clock," IEEE J. Solid-State Circuits, vol.25, pp.766-769, 1990.
- [7] J. Yuan and C. Svensson, "New single-clock CMOS latches and flip-flops with improved speed and power savings," IEEE J. Solid-State Circuits, vol.32, no.1, pp.62-69, 1997.
- [8] H. Kojima, S. Tanaka, and K. Sasaki, "Half-swing clocking scheme for 75% power saving in clocking circuitry," IEEE J. Solid-State Circuits, vol.30, no.4, pp.432-435, 1995.
- [9] H. Kawaguchi and T. Sakurai, "A reduced clock-swing flipflop (RCSFF) for 63% power reduction," IEEE J. Solid-State Circuits, vol.33, no.5, pp.807-811, 1998.
- [10] J. Montanaro, et al., "A 160 MHz 32b 0.5 W CMOS RISC microprocessor," ISSCC Digest of Technical Papers, pp.214-215, 1996.

[11] K. Usami, et al., "Automated low-power technique exploiting multiple supply voltages applied to a media processor," IEEE J. Solid-State Circuits, vol.33, no.3, pp.463–471, 1998.



Young-Su Kwon received the B.S. and M.S. degrees in Electrical Engineering from KAIST (Korea Advanced Institute of Science and Technology), Korea in 1997 and 1999, respectively. He is currently pursuing the Ph.D. degree in Electrical Engineering at KAIST. His current research interests include low-power circuit and graphics hardware design.



**In-Cheol Park** received the B.S. degree in Electrical Engineering from Seoul National University in 1986, the M.S. and Ph.D. degrees in Electrical Engineering from KAIST (Korea Advanced Institute of Science and Technology), in 1988 and 1992, respectively. From May 1995 to May 1996, he worked at IBM T.J. Watson Research Center, Yorktown, New York as a postdoctoral member of the technical staff in the area of circuit design. He

joined KAIST in June 1996 as an Assistant Professor in the Department of Electrical Engineering. His current research interest includes CAD algorithms for high-level synthesis and VLSI architectures for general-purpose microprocessors.



Chong-Min Kyung received the B.S. degree in Electronic Engineering from Seoul National University, Korea, in 1975, and the M.S. and Ph.D. degrees in Electrical Engineering from KAIST (Korea Advanced Institute of Science and Technology), Korea, in 1977 and 1981, respectively. After graduation from KAIST, he worked at AT&T Bell Laboratories, Murray Hill, NJ, from April 1981 to January 1983 in the area of semiconductor device

and process simulation. In February 1983, he joined the Department of Electrical Engineering at KAIST, where he is now a Professor. His current research interests include microprocessor/DSP architecture, chip design and verification methodology. He is Director of the IDEC (Integrated Circuit Design Education Center) established to promote the VLSI design education in Korean universities through CAD environment setup, chip fabrication services, and providing various educational materials and media related with integrated circuits and systems design.